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Abstract 

Statistical applications often involve the calculation of intractable multidimensional 
integrals. The Laplace formula is widely used to approximate such integrals. How¬ 
ever, in high-dimensional or small sample size problems, the shape of the integrand 
function may be far from that of the Gaussian density, and thus the standard Laplace 
approximation can be inaccurate. We propose an improved Laplace approximation 
that reduces the asymptotic error of the standard Laplace formula by one order of 
magnitude, thus leading to third-order accuracy. We also show, by means of practical 
examples of various complexity, that the proposed method is extremely accurate, even 
in high dimensions, improving over the standard Laplace formula. Such examples also 
demonstrate that the accuracy of the proposed method is comparable with that of other 
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existing methods, which are computationally more demanding. An R implementation of 


the improved Laplace approximation is also provided through the R package iLaplace 
available on CRAN. 

Keywords: Asymptotic expansions for integrals; Bayes Factor; Conditional minimisation; 
Integrated likelihood; Normalising constant; Numerical integration. 

1 Background 

Statistical applications often involve the evaluation of finite integrals of the form 

I n = f e- hn{x) dx , (1) 

JjR d 

where h n (x) is a smooth and concave real function, with x a d- dimensional real vector, indexed 
by n > 0. For instance, in Bayesian analyses, -h n (-) may be the log-likelihood or the log- 
posterior kernel and (JT|) is the Bayesian marginal likelihood or the posterior normalising 
constant. Furthermore, in Generalized Linear Mixed Models (GLMM) -h n (-) may represent 
the log-likelihood plus the log-density of the random effects. In this case, © gives the 
marginal likelihood for the parameters (6,9 U ), which can be generally written as 

L{9, 9 U ] y) = I L(9;u,y)f(u-,9 u )du 

= j exp{log L{6\ u, y) + log f(u; 9 U )} du 

= < 2 > 

where /(n; 9 U ) is the density of the random effects indexed by the parameter Q u , and L{6\ u, y) 
is the likelihood for 6 based on the conditional density of y given u. The quantity n is related 
to the information in the sample, and is often the sample size. 
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Integral ([[]) is_ frequently intractable but it can be approximated by several methods 


(see, e.g., 

Evans & Swartz 

, 2000) 

Bleistein & Handelsman 

■ 1 

986 

, c 


2000). Here, we fo cus on the Laplace approximation (see, e.g., 


Chap. 8 and 


Small, 


2010 


, Chap. 6). Let x = (x i,..., Xd) be 


the unique minimum of h(-), where to ease notation hereafter we drop n from I n , h n {-) and 
related quantities. In addition, we assume that the Hessian matrix of h(-) at x , i.e. 

d 2 h(x) 


V = V{x) = 


dxdx T 

is positive definite. The Laplace approximation of ([I]) is second-order accurate, i.e., / = 
i L {l + 0(n~ 1 )}, with 

J L = (2n) d/2 \V\~ 1/2 H{x), (3) 


where H(-) = exp{—h(-)}; see, e.g.. iBleistein fe Handelsmanl (119861. p. 335) 


The Laplace approximation is widely used both in the 


matin g posterior densities and p osterior moments 


20091 ) or Bayes Factors (see, e.g., 


Kass .& Rafte ryi. 


see, e .g., 


integrating out random effects in GLMM (see, e.g^ 


marginal likelihoods in group models (see, e.g., 


Bavesian framewor 

c for 

approxi- 

Tiernev & Kadane 

• 

1986 

Rue et al. 


19951). and in the frequentist framework for 


Breslow fe Clayton! 


Barndorff-Nielsen fe Cox 


199311 or to compute 


Pace et al. 


1994 . Sect. 2.8; 


2OO0J. In addition, it has also 


tions of matrix arguments (IButler fe Wood . 


j een u sed to appro ximate hyp e rgeom etric func 


20021 ). Moreover, 


Ruli et al. 


(12014T ) propose a 


simulation algorithm which draws posterior samples by inverting the approximate cumulative 
distrib ution function based on th e Laplace approx i matio n for marginal posterior densities. 


Lastly, 


Martino et al 


(2 0111) and 


Rizopoujos et al 


(120091 1 apply the Laplace method in the 


context of survival analysis and joint modelling of survival and longitudinal data, respectively. 
In the standard asymptotic setting with d fixed and n —> oo, the Laplace approximation 
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(ED is second-order accurate. On the other hand, if also d is large, the asymptotic expansion 
requires more terms in order to achieve the same accuracy as in lower-dimensions. For in¬ 
stance, when (ED factor i ses as a product of d scalar identical integrals, the relative error is 


of order 0(d/n ) ((Small . 


2010 


, Sect. 6.9). However, in practice both d and n are fixed, and 


when d is large relatively to n it may be necessary to improve the accuracy of the standard 
Laplace method. Moreover, an unappealing feature of the Laplace approximation is that it 
does not account for skewness or kurtosis in the integrand function. Therefore, when the 
shape of the integrand is far from that of the Gaussian density, which can happen especially 
in high-dimensional or in small sample size problems, the standard Laplace approximation 
can be severely inaccurate. Example 13.21 in Sect. 3 shows an example in which the Laplace 
approximation fails dramatically, and with inaccuracy that deteriorates with increasing di¬ 
mensionality. 

A possible way to improve the Laplace approxi mation is thro ugh the inclusion of higher- 


order de rivatives o f h(-) in the Taylor expansions. 


context. 


Raudenbush et al. 


Lindlev (11980 ) uses this idea in a Bayesian 


(2000) propose a higher-order Lapl ace approx imation for GLMM, 


by considering derivatives of h(-) up to the the sixth order. 


Pace et al 


(|2006j) use a similar 


approach for approximating marginal likelihoods in group models. However, when d > 1, 


the computation of higher-order derivatives can be tedious. S imilar 


by the Bayesian Bartlett correcti on proposed 


Laplace approximation of 


by 


DiCiccio et al. 


strategies are pursued 


(119971 ). and by the corrected 


Shun & McCullagh (ll995l) . However, the former involves posterior 


expectations, which in practice must be approximated through Monte Carlo methods and the 


latter solution is designed for situations, such as models with crossed random effects, in which 
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the standard Laplace approximation may not be asymptotically valid. Another improvement 


of the standard Laplace approximation is proposed by 


Nott et al. 


(1200911 . in which (JT]) is 


approximated by a product of scalar blocks, after a preliminary variable transformation to 
achieve approximate orthogonality. 

In this paper we propose an improved Laplace approximation for integrals of the form 
(JTD that, unlike the standard Laplace formula, can account for skewness and non-Gaussian 
tails in the integrand function. Moreover, we show that the proposed method has relative 
approximation error of order 0(n~ 3 / 2 ), in a standard asymptotic setting in which the sample 
size n diverges and d is fixed. The core idea of the proposed method is to build an approx¬ 
imation of the normalised integrand through sequential and re-normalised ratios of Laplace 
approximations. Finally, an approximation of the target integral (jT]) is obtained indirectly 
by the ratio of the un-normalised integrand over the approximation of the normalised inte¬ 
grand, both evaluated at a specific point x. Essentially, the proposed approximation of (JT|) 
can be written as J lL = c/ L , where c > 0 is the improvement over the standard Laplace 
approximation, and / = J lL {l + 0(n ~ 3 / 2 )}. 

Compared to the standard Laplace approximation, the proposed method requires re¬ 
peated conditional minimisations and repeated evaluations of the log-integrand function and 
its Hessian matrix. Conditional minimisations can be computationally demanding in high 
dimensions. Therefore, an alternative version is introduced, which uses approximate con¬ 
ditional minima obtained through a first order Taylor series expansion around the global 
minimum. This alternative version reduces the computational time while keeping compara¬ 


ble accuracy with respect to the original version. Nevertheless, the most demanding task is 
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the computation of the global minimum, which is a requirement also for the standard Laplace 
method. 

The rest of the article is structured as follows. Section 2 introduces the improved Laplace 
approximation. Section 3 illustrates the method in examples in which comparison with 
alternative approximations are also given. Section 4 concludes with some final remarks. 


2 The improved Laplace approximation 


Let p{x) = H{x)/I be the density function which corresponds to the kernel H{x) = exp{ — h(x)} 
with normalising constant /. By the identity 


I = 


H{x ) 
p{x) 


(4) 


if p(x) is known then / is readily available, for an arbitrary x. Alternatively, if a suitable 
estimate p{x) of p{x) is available, (J4j) provides an estimate I of /, given by H(x)/p{x). 


or instance, (HD has been used to es 


Chib, 


1995 


Chib & Je 


Markov random fields ( Rue et al 


iazkovl. 


2001 


imate Bayesian marginal likelihoods in MCMC settings 


Hsiao et 


2004, 


2009!). 


al 


20 041), and to approximate hidden Gaussian 


While 


holds for any x, it is advisable to locate such a point at a high density region 


(IChibl . 119951 ). One possibility is to choose x = x, w hich ma y be a lso convenient from a 


computational point of view. In MCMC settings, 


Hsiao et al. 


(120041 ) show that coordinate 


points other than x may improve the approximation error. However, locating such points 
can be computationally intensive. 

Let Xi :q = (aq,... ,x q ) be the first q and x q+ \-d the last d — q components of x (q < d ). 


6 













































Moreover, let x Xl be the conditional minimum of h(-) with x\ hxed and let x 


%l:q j^q+l 


be the 


conditional minimum with x\ :q hxed at x\- q and x q+ \ hxed. We require that h(-) satisfies the 


usual regularity conditions for the validity of the Laplace approximation (see, e.g., 


Kass et al. 


19901 ). 


Write p(x) as 


P( x ) = p Xl {x l ) X Px 2 \X 1 (x 2 \x 1 ) X • • • X Px d \Xi :d _i 

IIRd-1 H(x) dX 2 :d JlR d - 2 H{x) dX3:d 


„H d -2 ±± W H(x) 

X -—— - X • • • X 


(5) 


f R d H{x) dx H(x) dx 2 ,d f R H(x) dx d 

An improved approximation of p(x) can be obtained by approximating the integrals of each 
ratio on the right hand side of (J5J) through the Laplace formula. Specifically, the Laplace 
approximation of the marginal density px i(%i) is 

B(xi,x Xl ) f |K| 


1/2 


PxAx 1 ) = 


H{x) I 27r|V2:d(xi, x Xl )| 


( 6 ) 


where V 2 : d(-) is the block (2 :d, 2 :d) of V(-). This result is due to 


Tiernev & Kadanel ( 19861) . 


For the gth conditional density in (|5]) (2 < q < d), we apply the Laplace approximation to 
the numerator and the denominator and obtain 


PXq\X 1 :q -A X q \Xl-. q -l) 


PX 1:q (x 1:q ) 


(7) 


Px L : q -1 ( xi : q — l ) 

Finally, the conditional density Px d \x 1 . d _ 1 (•)> approximated by applying the univariate Laplace 
method to the integral in the denominator, is 


Px d \x 1:d _ 1 (x d \x 1:d _i) - 


H(x) 


H (x\ d— 1 , x Xl:d _ 1 ) 


Vd:d(pE\-.d— I; *7: 

27 




1/2 


( 8 ) 


where Vd-d(-) is the ( ith ele ment of the diagonal of V(- ) . Usin g results and under the assump¬ 


tions of 


Kass et al. 


( 19901 ) and of 


Tierney fe Kadane (1l986f ). it is possible to show that 
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(J7J) and ([HD have overall relative error of order 0(n _1 ). Recalling that, to use identity 0 we 
only need an approximation p(x) of p(x ), we might be tempted to take as p(x) the product 
of ([H]) times (0 (for 2 < q < d) times (0, all evaluated at x. However, such a product, when 
replaced in 0, reproduces exactly 0, the Laplace approximation of I. 

To achieve third-order accuracy we propose to re-normalise numerically 0, 0 and 0. 
Re-normalisation of 0 and 0 entails the evaluation of multi-dimensional numerical inte¬ 
grations. While this is true in general, in our case we only need an approximation for p(x) 
and therefore it is still possible to re-normalise 0 and 0 by using only scalar numerical 
integration. The key point is to fix all the conditioning variables at the corresponding modal 
values prior to the re-normalisations, as explained in Scheme |T] 

The product of the re-normalised versions of 0, 0 and 0, evaluated at x, gives a 
third-order approximation of p(x), as shown by the following theorem. 


Theorem 2.1 Under the assumptions of 


Kass et al. 


lil 99 (a) for the regularity of the Laplace 


approximation, the improved Laplace approximation of p(x) has third-order accuracy, i.e. 


p(x) = p zL (x){l + 0(n 3 ^ 2 )}. 


Proof The first step is to show that the approxim ation error o f 0 and 0 holds uniformly. 


The uniformity of 0 has been alreac 
more, on the basis of Theorem 6 of 


V shown bv 

Kass et al. 

Kass et al. 

( 

1990 

), we 


(1990, Theorem 6). Further- 


uniformly. This is immediate as 0 is the ratio of the Laplace approximation of the marginal 
density of Xi :q over that of Xi, q _i (2 < q < d ), both with relative error of order 0{n~ l ) 
holding uniformly. Hence the error in 0 is also uniform. 















Step 1 Compute ci, the normalising constant of (EJ); 

Step 2 For each q (2 < q < d), compute c q , the normalising constant of (JTJ) with all thd 
conditioning variables fixed at their modal values; 

Step 3 Compute c r n the normalising constant of (jHJ) with all the conditioning variables 
fixed at the corresponding modal values; 

Step 4 Set p lL (x) = ^4^- jllqla | 4 x d\xv.d-i) ag ^j ie unproved Laplace 

approximation of p(x); 

Step 5 Finally, get the improved Laplace approximation J lL = H(x)/p lL (x) of /. 

Scheme 1: Pseudo-code description of the improved Laplace method. 


Tierney fc Kadane ([1986.) show that, after numerical re-normalisation, (j6j) has error of 
order 0(n~ 3 ^ 2 ). This is because the 0(n -1 ) term, when uniform, gets absorbed into the 
normalising constant. To complete the proof we need to show that also ([TJ), after numerical 
re-normalisation with respect to x q and with the conditioning variables fixed at the corre¬ 
sponding modal values, has error of order 0(n ~ 3 / 2 ). To prove this, let P* Xq \x 1 . q ^ 1 ( x g\^x. q -i) 
be the re-normalised approximate conditional density of X q \X\ :q _\ with the conditioning 
variables fixed at the modal values, i.e. 


Px q 


Cq l PX i: g QCl = q-l,Sq) 

Cq— lPX\. q -i 


where c q = J Mq Px 1:q (x 1:q ) dxi :q and c q -i = ^^px^^Xi-.q- i)dzi :(? _i. This re-normalised 


9 









approximate conditional density is third-order accurate, i.e. 


PXq\Xi:q-i \X q \Xl :q — 1 ) 


_C q 1 pX 1 :g (Xl:g-l,Xg){l + Q(n 3/2 )} 

C q-lPX 1 :q -i(xi :q -l){l + 0(n ~ 3 / 2 )} 

: + 0 (n- Z ' 2 )} 

+ 0(n~ 3/2 )} 


However, note that there is no need to compute c g _i and c q , because 


P*X q \X 1 :q ( X q\Xl:q-l) 



Cq'px^gjxi-.g-l^q) 

C-hPx^g^V.q-l) 

<% 1 PX 1 . a (.X 1: g- 1 ,X q ) 

-1 . q - 7Z -rd X q 

c lPXl _ 1 (Xl:g- 1 ) y 


PX 1 : q (Xl: q -l,X q )/PX 1 :q - 1 (Xl:g-l) 

J n PX 1:v (x 1 .. q - 1 ,X q )/px 1:q _ 1 (xi:q-l)dx q 

PX 1 :q (Xl-.q-l,X q )/p Xl:q _ 1 (Xl:,-l) 


That is, we need only to re-normalise (j7J) with the conditioning variables fixed prior to inte¬ 
gration at their modal values, i.e. to compute c q . Note that, after numerical re-normalisation 
with the conditioning variables fixed, approximation (JS]) becomes exact. □ 


Finally, the replacement of p(x) with p lL (x) in (J4j) delivers the improved Laplace ap¬ 
proximation of (flj) . Or equivalently, the improved Laplace approximation can be written as 
J lL = c/ L , with c = nil Q and c* (1 < i < d) defined in Scheme [Q 


Remark 1 The factor c is an index of the magnitude of the improvement of the proposed 
method over the standard Laplace approximation. Indeed, values of c close to 1 indicate 
that the improved Laplace approximation is not improving over the standard Laplace. In 
this case, it is likely that the integrand is Gaussian-like. On the other hand, c ^ 1 indicates 
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that the integrand may not be Gaussian-like, e.g. it may be skewed and/or heavy-tailed. 
Note also that if / factors as the product of d scalar integrals, then the improved Laplace 
approximation of / corresponds to its computation via numerical integration. 


Remark 2 An important diffe rence o f t 


Laplace approximation (INLA) of 


Rue et al 


r e prop osed method from the integrated nested 


((2009) is that our method is specifically designed 


to approximate normalising constants or marginal likelihoods for general models, and for 
arbitrary components of the parameter for both Bayesian and frequentist inference. For 
instance, the method can be used to approximate (J2J) even when random effects are not 
necessarily Gaussian. On the other hand, INLA is designed for approximating marginal 
posterior distributions and ca n approx i mate o nly Bayesian marginal likelihoods of Gaussian 


latent fields (see, Eq. (30) of 
provided in Section 1X31 


Rue et al 


20091 ). Some numerical comparison with INLA are 


Remark 3 The order (aq,..., aq) is arbitrary, that is, the asymptotic error of the improved 
Laplace is not affected by their permutation. In practice, however, it may be useful to order x 
according to the cardinality of the arguments of each element of S/h(x), the gradient of h(-). 
In particular, the element of x for which the corresponding element of Vh(x) depends on all 
elements of x may be placed as the first factor in ([5]). For instance, if d — 3 and dh(x)/dx i 
depends on aq, dh(x)/tdx 2 depends on aq :2 and dh(x)/dx 3 depends on x, then the order 
(aq,aq,aq), with aq being the first factor in (15]), can simplify the conditional minimisations 
required by the proposed method. Obviously, when each element of Vh(x) depends on aq :3 , 
there is no preferred ordering. In the examples considered in Section 3 we did not experience 
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any practical difference in the results across different permutations of x. 


Remark 4 Conditional minimisations can become computationally demanding when d is 
large. Nevertheless, it is possible to avoid them by cons idering a first-order Taylor series 


expansion of the conditional minima as in 


Cox fe Wermuthl (119901) . In particular, let x = 


(y,z), where y is the fixed block and z the remaining part of x. Then z y , the conditional 
minimum of h(-) for fixed y, can be approximated by the linear regression 


l t =z + V„ V, y (y - y), 


(9) 


which is such that z y — z y = 0(n 1 ). Recently, 


Kharroubi & Swee ting ( 20161) ap plied a similar 


idea in a different context and noted excellent performance (see also 


Ruli et al. 


2 0141) . We 


explore the numerical performance of the improved Laplace approximation with approximate 
conditional minima z y in place of their exact version z y in Examples 13.2113.3113.51 and 13.61 


Remark 5 Linear constraints Ax = b, with A and b being appropriate matrices and vectors 
respectively, can be handled through a change of variables problem and by applying the 
proposed method to the remaining free components of x. Finally, when the distribution of 
x\\x 2 is more Guassian-like and X 2 is low-dimensional, as it happens in the INLA framework, 
then it may be more sensible to approximate X\\x 2 by the proposed method and integrate 
out x 2 by numerical integration. 
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3 Examples 


The impro ved Lap lace 


iLaplace flRuli et al, 


approximation is implemented in the R (JR Core Teaml . 2016) package 


20161) . available on the CRAN repository. Except for the example of 


Section 13.11 computations with the improved Laplace approximation, with either exact or 
approximate conditional minima, are performed in parallel over 11 threads through the par¬ 
allel implementation provided in the package iLaplace. Essentially, it is an embarrassingly 
parallel implementation in which each integral in Steps 1-3 of Scheme [Tj is computed through 


a separate thread. 


3.1 Gompertz distribution: fixed d and n —>■ oc 

Consider the sequence of sample sizes {n* = + l^yX-i], = 20, and i — 2,..., 30}, 

where the symbol [•] denotes the ceiling function. Let y l — (yi, ... ,y ni ) be a random sample 
of size rii (i = 2,..., 30) from the Gompertz distribution, with density 

p(y; 9) = a[3eP y e a exp{— ae^ y }, 

with 9 = (9i,6 2 ) = (loga, log /3) G 1R 2 and y > 0. Moreover, for each i, consider 100 
random datasets of size n^, from the Gompertz distribution with a = 2 and /3 — 3. For 
each of these datasets, we compute the normalising constant of the posterior distribution 
n{6\y l ) oc L{6\ y l )7r(9), where L(9;y l ) is the likelihood function for 9 based on data y l and 
7 t(9) = 7t(9i)tt(92) is the prior distribution with tt(9 i) and ^r(9 2 ) both being N( 0,100). 

The aim is to compare the behaviour of the standard (J L ) and the improved (/ lL ) Laplace 
approximations with the target value (I) computed by adaptive numerical integration, as the 
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sample size n diverges. Similarly to 


Diciccio & Young (2008) and 


Davison et al. 


(1200611 . let 


a\ > 0 , 02 > 0 , (fei, 62 ) G IR 2 and suppose that 


I = i iL (l + b in - ai ) + o(n~ ai ), 

and that 

I = J L ( 1 + b 2 n~ a2 ) + o(n“ a2 ), 

for n —y 00. Then, lim {/ lL /J} = 1 and lim {I L /I} = 1, and if the improved Laplace ap- 

n—>oo n—> 00 

proximation is more accurate than the standard Laplace, then the first limit should converge 
faster. Furthermore, a log-log graph of |/// lL — 1| (|/// L — 1|) against n should be be linear 
with slope —Oi (—a 2 ) and intercept log 1611 (log |& 21 )- 

The left panel of Figured] reports the log-log plot of J lL // and J L /J, both averaged across 
the 100 repetitions at each value of and against the sample size n. This plot highlights that 
the improved Laplace approximation is more accurate than the standard Laplace method, 
since the visual convergence at 1 of I lL /I happens at a faster rate. 

The log-log plot of the relative error averaged across the 100 repetitions at each value of the 
sample size n is shown on the right panel of Figured! This shows that the improved Laplace 
method achieves third-order accuracy whereas the standard Laplace formula is second-order 
accurate, e.g. a\ = 1.5 and a 2 = 1. 


3.2 Multivariate t/skew-t 


To assess the accuracy of the prop osed m ethod even in extreme settings, we consider the 


multivariate t/skew-t distribution of 


Jonesl (120021 1. with density 
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Figure 1: Numerical evaluation of the asymptotic error of the improved and standard Laplace 
methods. Left panel: log-log plot of (J lL /J) (•) and (J L /-0 (°) versus n. Right panel: log-log 
plot of the relative error |/ lL /J — 1| (•) and \I L /I — 1| (o) versus n; the solid and dashed lines 
are the corresponding least-squares regression lines. The empirical slope is -1.51, with 0.99 
confidence interval (-1.53, -1.48), for the solid line, and -1.01, with 0.99 confidence interval 
(-1.09, -0.93), for the dashed line. 
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p(y- is, a, c) = 


r((i/ + d)/2) 


r((i/ + l)/2)R(a, c)(a + c) 1 /22“+c-i( z/7r )(d-i)/2 

<1 + -W— 0 + «^)“ +I/2 (l - «^) C+I/2 


X 


{1 + u~ 1 (yf + ... + y 2 d )Y v+d)/2 


The positive parameters a and c determine the distribution of the skewed marginal, the 
parameter is, i.e. the degrees of freedom (df), controls the tail behaviour of the distribution, 
and and T(-) are the beta and gamma functions, respectively. This distribution is 

obtained from the multivariate Student’s f-density centred at 0 and with identity scale matrix, 
where the marginal density of the first component is replaced with the univariate f/skew- 
t density. The case with a = c = is /2 leads the the ordinary multivariate Student’s t- 
distribution with identity scale matrix and v degrees of freedom. 

We approximate the normalizing constant of the multivariate f/skew-f density, with the 
standard and the improved Laplace approximations, in two scenarios: the first with a = c = 
1.5, and the second with a = 12 and c = 0.5. For each scenario, we consider multivariate 
t/skew-t densities with varying dimension and degrees of freedom. Results in the first row of 
Figure [2] show that the standard Laplace approximation rapidly deteriorates with increasing 
dimensionality, and increasing non-Gaussianity, i.e. low is, higher skewness (large a and small 
c or vice versa). On the contrary, the improved Laplace approximation is reasonably accurate 
and stable across both in creas ing dimensionality and non-Gaussianity. A similar example has 


been considered also by 


Nott et al. 


(1200911 . in order to test the accuracy of their modified 


Laplace approximation. However, their method is substantially less accurate then ours, only 
slightly improving the poor quality of the standard Laplace approximation. For instance, the 
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normalising constant of the 10-variate t/skew-t density with a = 4, c = 1 and v = 3^ is_0.013 


Nott et al. 


(120091) and 


with the standard Laplace, 0.02 with the modified Laplace method of 
0.9981 with the improved Laplace approximation. 

Now consider the same example but using the approximate conditional minima introduced 
in Remark 4 of Section 2.1. As shown in the second row of Figure[2] in this case, the results of 
the improved Laplace approximation with either actual or approximate conditional minima 


coincide. 


3.3 A comparison with INLA 

The following example has been considered by 
challenging for the INLA methodology. 


Ferkingstad fe Rue (.2015) and is known as 


Let y = (yi,, y n ) be conditionally independent binary values with 


Yi\ui Bernoulli , 
logit (p^ = P + Ui , 

where Ui ~ iV(0, a 2 ), for i = 1,..., n. We assume independent priors for /? and v = a -2 , with 
j3 ~ iV(0,1) and v ~ Gamma(l, 1) and we wish to obtain the marginal posterior distributions 
of /3 and is. 

We apply the improved and the standard Laplace methods to approximate L(/3, is ; y), the 
marginal likelihood defined in (J2D under the aforementioned modelling assumptions. Since 
L(/3, is; y)n(/3, is) is bivariate, we use adaptive numerical integration for obtaining the marginal 
posteriors n((3\y) and ir(is\y). For comparison purposes, the marginal posteriors are also ap¬ 
proximated by: MCMC, the standard version of INLA and the improved INLA proposed by 
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Figure 2: First row: standard (red coloured symbols) and improved Laplace (black) ap¬ 
proximations of the normalizing constant (in logarithmic scale) of the multivariate t/skew-t 
density against dimension. Second row: improved Laplace approximation with either exact 
or approximate conditional minima overlap. Degrees of freedom are: 3 (o), 5 (A), 10 (+) 
and 20 (x). 
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Ferkingstad & Rue (120151) . For 


samples with the JAGS software (iPlummerl . 


he MCMC app roximation, we consider 10 6 final posterior 


2 0131 1. after a burn-in of 10 6 samples. Computa¬ 


tions with INLA are done using the associated R package R-INLA, using the default options. 

Note that the integral over u = (iii,... ,u n ), required for obtaining L(f3,v;y), can be 
factorised as product of n scalar integrals. Hence, in this case the improved Laplace method 
is as accurate in approximating v\y) as is numerical integration. 

As an illustration, consider a sample of size n = 100 generated from the model with 
a 2 = 1 and j3 = 2. As a gold standard we use MCMC as implemented in the JAGS software. 
We compare it with the im proved and th e stand ard Laplace approximations and with INLA 


and the improved ILNA of 


Ferkingstad & Hue (120151 1 in the first row of Figure [3j As ex¬ 


pected, the improved Laplace approximation is virtually indistinguishable from the MCMC 
approximation. On the other hand, the Laplace approximation and both versions of INLA 
perform slightly worse then the improved Laplace approximation. 

The second row of Figure [H] shows the marginal posteriors of (3 and log v approximated by 
the improved Laplace method using the approximate conditional minima introduced in Re¬ 
mark 4. In this case, the improved Laplace approximation with either exact (black continued) 
or approximate (red dashed) conditional minima gives indistinguishable results. 

A small simulation study is performed in order to assess the accuracy of the proposed 
method. In particular, we consider 100 datasets with sample size n = 100 drawn from the 
model and under the same parameter values as before. For each dataset, we compute the 
marginal posteriors of 6 by MCMC, here treated again as the gold standard, and by: the 
standard Laplace, the improved Lapalce (with approximate conditional minima) and the 
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log(v) 



log(v) 



Figure 3: First row: marginal posterior distributions for f3 and \ogv approximated by: 10 6 
MCMC samples with JAGS (histogram), Laplace (dash-dotted red curves), improved Laplace 
(solid black), INLA (dashed blue) and improved INLA (dotted magenta) methods. Second 
row: marginal posteriors of (5 and log v with the improved Laplace approximation with exact 
(black) and approximate conditional minima (red dashed). 
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original and corrected versions of INLA. The MCMC approximation is done by 10 6 samples, 
after a burn-in of 10 5 , and thinning equal to 10. As a measure of discrepancy, we compute the 
Kullback-Leibler (KL) divergence between the MCMC posterior and the other approximation 
methods. The KL divergence is defined as 

where n(6\y) is the MCMC posterior and n(9\y) is the approximate posterior obtained with 
the other methods. For simplicity, we compute two marginal KL divergences, i.e. one for 
/3 and one for v. The higher the KL, the worse is the approximation 7 r(-| y). The MCMC 
marginal posteriors are computed with logspline density estimation using the logspline 
package of R. This tends to give smoother density estimates than usual kernel density es¬ 
timators. The marginal posterior distributions obtained with either standard or improved 
Laplace approximation are available analytically, whereas those based on the two versions of 
INLA are build through smoothing splines. 

The results in Figure H] highlight that the marginal posteriors of /3 (left panel) and v (right 
panel) approximated with the proposed method are the closest to the MCMC posteriors in 
terms of the KL divergence. 

3.4 Nonlinear regression 

Consider the nonlinear regression model 

y% = A exp{l - exp(-Xi/f3 2 )} + ae u 
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Figure 4: Log-KL divergences of marginal MCMC posteriors of (3 (left panel) and v (right 
panel) from the corresponding ones approximated through the: Laplace (LA), improved 
Laplace (iLA), INLA and the corrected INLA methods in a repeated sampling context with 
100 replications. 


where y t is the response variable, re, is a covariate, (/?i, (3 2 ) are unknown regression parameters, 
and the e* are independent error terms, i — 1,... ,n. We focus on two possible distributions 
for the error term: the normal distribution and the Student’s t- distribution, with unknown 
degrees of freedom r. The aim is to choose among them through the Bayes Factor (BF), 
which in our case is given by the ratio of the posterior normalising constant of the normal 
model over that of the Student’s t model. 


As an example we consider the BOD2 dataset flBates fe Wattsl. Il988l p. 305), which 
concerns a study on biochemical oxygen concentration (y) as function of time (x). For both 
models, we assume the parameters are a priori independent. Moreover, a bivariate normal 
distribution with m ean vector zero and scale matrix IOI 2 , is assumed for (3. Following the 
recommendations of Gelmanl (120061 ). for the scale parameter cr we assume a half-Cauchy prior 


22 































with scale equal to 10. Finally, the Jeffreys rule prior proposed by (Fonseca et al . (120081 1 is 
taken for r. For numerical stability in the optimisations, a and r are considered in logarithmic 
scale. 

Table [1] shows the log-marginal likelihoods and the Bayes factor approximated by the irn- 


proved Lap l ace 


(DiCiccio et al 


appro ximation and by: the standard Laplace, the Bartlett-correc t ed La place 


19971 ). importance sampling, the method of 


Chib & Jeliazkov (20011 and 


adaptive numerical integration as implemented in the R package cubature. The 


Chib & Jeliazkovf s. 


IS and Bartlett-corrected Laplace approximations are replicated 500 times, where for each 
replication the MCMC algorithm is started at a different point. The final estimates of the 
log-marginal likelihood and of the BF (in decimal logarithmic scale) are obtained by averag¬ 
ing the 500 replications. At each replication, the Bartlett-corrected Laplace approximation 
is performed with 2 x 10 5 final MCMC posterior draws, after suitable burn-in and thinning 
to reduce autocorrelation. The sa me MCM C posterior sample is used also for computing the 


marginal likelihood with 


Chib & Jeli azkovr s method. For the IS approximation we consider 


10 6 draws from the multivariate Student’s ^distribution with m degrees of freedom, centred 
at the posterior mode, with scale matrix equal to c > 0 times the inverse of the posterior 
Hessian at the modal value. Several values of c around 1 and m e [3, 50] were also con¬ 
sidered. However, they gave very similar results, so we set c = 1 and m = 3. This choice 
permits to h ave an importance density with finite variance and with heavy tails (see, e.g., 


Evans fe Swartz] , 


2000 


, Sect. 6.3). The standard deviation of the 500 log-marginal likelihoods 


divided by \/500 is taken as a measure of Monte Carlo standard error. 

Results in Table [D indicate that the standard Laplace approximation and its Bartlett- 
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Model 

Adaptive 

Laplace 

Improved 

Bartlett- 

Chib & 

IS 


integration 


Laplace 

corrected (3x SEmc) 

Jeliazkov (3x SEmc) 

(3xSE M c) 

Normal 

-2.539 

-2.905 

-2.540 

-2.504 (0.0198) 

-2.537 (0.003) 

-2.539 (0.0001) 

Student’s t 

-2.488 

-5.179 

-2.449 

-4.188 (0.0199) 

-2.478 (0.0031) 

-2.457 (0.0001) 

logi 0 BF 

-0.022 

0.988 

-0.039 

0.731 (0.0103) 

-0.027 (0.0057) 

-0.036 (0.0183) 

(Normal vs 

t) 







Table 1: B0D2 data. Logarithm of Bayesian marginal likelihoods and BF computed 
with: adaptive numerical integration, standard Lap lace, the im prove d Laplace, the Bartlett- 
corrected Laplace, importance sampling (IS) and Chib & Jeli azkovr s method. For the last 


three methods, the point estimates are obtained by averaging over 500 replications; the quan¬ 
tity in parenthesis gives 3 x SEmc, where SEmc is the Monte Carlo standard error given by 
the standard deviation divided by \/500. 


corrected version are q uite in accurate, s ince both lead to substantial evidence in favor of the 


normal model (see 


K ass & Raftery, 


1995 


for the interpretation of the BF). Such an evidence is 


not confirmed by the BF approximat ed t hrough adap tive numerical integration, here treated 


as the gold standard; neither IS and 


Chib fe Jeliazkov’s method confirm the aforementioned 


evidence. In ad ditio n, results of the improved Laplace method are in reasonable agreement 


with IS, 


Chib fe Jeliazkovl ’s approximation and adaptive numerical integration. 


The inaccuracy of the standard Laplace approximation in the case of the Student’s t model 
is most likely due to the non normality of the marginal posterior of (log a, log u). Indeed, a 
look at the bivariate kernel density estimate of this marginal bivariate posterior (not reported 
here) reveals that it is banana-shaped, and therefore it is quite far from being elliptical. 
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Nevertheless, such a shape is well accommodated by the improved Laplace approximation. 


3.5 GLMM with crossed random effects 


We consider the problem of approxima ting the marginal li 


a model with crossed random effects ( Shun fe McCullagh . 


c eliliood for the fixe d parameters in 


1995 : 


Shun, 


is useful, for instance, when analysing the Salamander matin g data 


199711. Such a model 


1989 


p. 439). 


Booth fe Hobert 


'liese c 


(11999T1 . 


ata h ave been analys e d by 


Bellio & Varinl ( 2005 1. 


Karim fc Zeger ( 19921 1. 


(IMcCullagh &; Nelden. 


Shun ( 1997 1. 


Sung & Geverl (120071) . among others, and 


consist o f th ree separate e xperiments, each performed according to the design given in 


McCullagh & Wider (1989 


, Table 14.3). Each experiment involved matings among sala¬ 


manders in two closed groups. Both groups contained five species R females, five species 
W females, five species R males and five species W males. Within each group, only 60 of 
the possible 100 heterosexual crosses were observed owing to time constraints. Thus, each 
experiment resulted in 120 binary observations indicating which matings were successful and 
which were not. 


As in 


McCullag h & Nelde r (11989 


, p. 441), the data are modelled as if different sets of 20 


male and 20 female salamanders were used in each experiment. Let y %3 be the indicator of 
a successful mating between female i and male j, for i, j = 1,..., 60, where only 360 of the 
(i,j) pairs are relevant. Let u{ denote the random effect that the Ah female salamander has 
across matings in which she is involved, and define U™ similarly for the jth male. The data 
Uij are assumed conditionally independent with 


Yij\u{, u™ Bernoulli Cm), 
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logit (py) = x]^ + u{ + uf, 


where Xij is a 4-dimensional row vector of zeros and ones indicating the type of cross, /3 is 
the vector of fixed effects, Uj ~ 1V(0, aj) and UJ l ~ 1V(0, 

As a first example we consider the estimatio n of (3 and ( aj, J for each separate experi¬ 


ment - following the same model structure as in 


S hu n (119971 ) - performed by maximising the 


approximate marginal likelihood. The aim is to compare the maximum likelihood estimate 
(MLE) based on the marginal likelihood approximated by the imp roved Lap l ace method wit 


those based on 
and 


he modified Laplace approximation proposed by 


Shu n & McCullagh (119951 ) 


Shun ( 19971 ). It is well known that in models with crossed rando m e ffects the standard 


19951 ). and it may 


Laplace approximation is not asymptotically valid dShun fe McCullaghl . 
give poor results. 

Let /3 = (/3 0 , Avs/j Avs m) Avs/xws m ) be the fixed effects, where /3 0 is a constant, ( 5 W s / is 
the effect of the dummy variable WS f which takes one if the observation is from a species 
W female and zero otherwise, and so on. The marginal likelihood has the form (J2]) , with 
6 = f3 and 6 U = (a|,cr^) and involves a 40 dimensional integral that cannot be reduced to 
a product of lower dimensional integrals, even though the random effects Uf = ( u [,... ,u{q) 
and u m = (u ™,..., u™ 0 ) have independent normal distributions. The approxim ate M LE for 


Shun (11997 . 


the three separate experiments (reported in Tab [2]) are compared with those of 
Tab. 2 and Tab. 3). 

From Table [2] we notice that the standard Laplace method is very fast but the resulting 
MLE are quite inaccurate as far as variance components parameters are concerned. On the 


other hand, approximate MLE obtained with the improved Laplace method, with either 
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Approximate MLE 


Methods 

A) 

Pws f 

PwSm 

PwSfXWSm 

a } 

Laplace: 






Exper. 1 

1.34 

-2.94 

-0.42 

3.18 

1.58 

Exper. 2 

0.57 

-2.46 

-0.77 

3.71 

1.81 

Exper. 3 

1.02 

-3.23 

-0.82 

3.82 

0.35 

Modified Laplace of Shun 

(1997l a : 



Exper. 1 

1.37 

-3.02 

-0.44 

3.27 

1.72 

Exper. 2 

0.57 

-2.53 

-0.77 

3.79 

2.10 

Exper. 3 

1.04 

-3.31 

-0.83 

3.90 

0.46 

Improved Laplace: 





Exper. 1 

1.37 

-3.02 

-0.44 

3.27 

1.74 

Exper. 2 

0.56 

-2.55 

-0.79 

3.77 

2.12 

Exper. 3 

1.03 

-3.30 

-0.82 

3.90 

0.49 

Improved Laplace with approximate conditional 

minima: 

Exper. 1 

1.36 

-2.99 

-0.44 

3.24 

1.72 

Exper. 2 

0.56 

-2.49 

-0.75 

3.72 

2.07 

Exper. 3 

1.02 

-3.27 

-0.82 

3.87 

0.43 


a Corrected(l) values taken from 


Shun ( 199711 


(J rn 


0.073 

0.92 

1.85 

0.185 

1.10 

2.07 

0.189 

1.14 

2.12 

0.15 

1.05 

2.03 


Sec. 


0.92 

0.72 

1.05 


397 

209 

145 

56 

64 

83 


N. of. Iter. 


22 

15 

22 


29 

17 

11 

15 

16 
20 


Table 2: Salamander data. Comparison of the improved Laplace method (with exact and 
approximate conditional minima) with the standard Laplace and the modified Laplace ap¬ 


proximation of 


Slum (119971 ). “Sec.” refers to the elapsed time in seconds and “N. of Iter.” 


refers to the number of iterations before convergence. 
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exact or approxim ate c onditional minima, are closer to those based on the modified Laplace 
approximation of Shun (I1997I ). However, the improved Laplace method is easier to compute 


since it does not require derivatives of the negative log-integrand beyond the second-order. In 
terms of computing time, the improved Laplace approximation with approximate conditional 
minima is much faster than the version with exact conditional minima, though slower than 
the standard Laplace approximation. 

Consider now the joint analysis of the Salamander data, by independently combining 
the three experiments’ data. In this case the marginal likelihood entails the computation of 
three 40-dimensional integrals. To compare our method with other results available in the 
literature, we consider a slightly modified version of the fixed effects. In particular, here f3 
is equal to (/3r/r, /3r/w, Av/r> Av/w)j where /3r/r denotes the effect of the cross between a 
species R female and a species R male, and so on. 

The approximate MLE obtained from the improved Laplace approximation with either 


exact or approxima te co nditional minima, the Monte Carlo Expectation-A/ 


Mj algorithm of lBooth fe Hobert 


(119991 ). the quasi-likelihood approach of 


aximisation (MC- 


Breslow fe Clavton 


19931 ). the standard 


sampling proposed by 


ap lace appro ximation and the posterior mean taken with the Gibbs 


Karim fe Zegerl (119921) are illustrated in Ta ble El The st andard Laplace 


approximation underestimates the variance parameters (see also 


Shun, 


1997|). The estimate 


of /3 and that of the variance parameters based on both versions of the im proved 


approximation are q uite similar to those of the MC-EM procedure of 


(see also 


Su ng & Geyer, 


Booth fe Hobert 


aplace 


(R999I) 


20071 ). However, compared to MC-EM, the proposed method does 


not require tuning from the practitioner. 






































Methods 



Approximate MLE 



A) 

Pws f 

Pws m 

PwSfXlVSm 

a f 


Laplace 

1.01 

0.31 

-1.90 

0.99 

1.17 

1.04 

Improved Laplace 

1.02 

0.32 

-1.95 

1.00 

1.39 

1.25 

Improved Laplace (approx, cond. min) 

1.01 

0.31 

-1.92 

0.98 

1.34 

1.19 

MC-EM (Booth & Hobert. 1999) 

1.03 

0.32 

-1.95 

0.99 

1.40 

1.25 

Gibbs (Karim & Zeeer. 1992) 

1.03 

0.34 

-1.98 

1.07 

1.50 

1.36 

PQL2 

0.87 

0.28 

-1.69 

0.95 

1.35 

0.93 


a From 


B ooth fe Hobcrti (11999T ) 


Table 3: Salamander mating data analysed jointly. Comparisons of the improved Laplace 
approximation (with either exact or approxim ate conditio nal minima) with the Monte Carlo 


Ex pectatio n-Maximisation (MC-EM) of 


Booth fe Hobert 


Karim & Zeger (119928 ). the quasi-likelihood (PQL) approach of 


(11999), t he Gib bs sa mpling of 


Breslow fe Clavtou |l993) 


and the standard Laplace approximation. 
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Figure 5: Salamander mating data analysed jointly. Approximate relative profile log- 
likelihoods for the variance components obtained with the standard Laplace approximation 
(black continued) and with the improved Laplace approximation with approximate condi¬ 
tional minima (red dashed). 

We notice that the approximate MLE based on the improved Laplace method with exact 
conditional minima is found within 9.5 minutes and after 14 iterations. Using approximate 
conditional minima, the approximate MLE is located within 3 minutes and after 15 iterations. 

We can use (J2]) also for conducting full likelihood-based inference. For instance, in the 
case of the Salamander data analysed jointly, let us consider profile likelihood-based confi¬ 
dence intervals for aj and crf n . Figure [5] depicts the aforementioned relative profile likelihoods, 
obtained with the standard Laplace and with the improved Laplace with approximate condi¬ 
tional minima. From this plot we notice that standard Laplace-based profile likelihoods for 
the variance parameters are narrower than those based on the improved Laplace approxima- 
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tion. For instance, the 0.95 confidence interval found by inverting the profile likelihood of aj 
and with the standard Laplace approximation are (0.38, 2.7) and (0.31, 2.46) while those 
based on the improved Laplace approximation are (0.46, 2.98) and (0.38, 2.69). 


3.6 GLMM with spatial random effects 


We c onsider the application a Poisson geostatistical model to the Rongelap dataset (IDiggle et all 


19981 ). This dataset reports counts %/i on radionuclide concentration over the length of time 


ti, at the spatial location s*, for i — 1,..., 157 different locations in Rongelap Island. On the 
basis of the theory of radioactive emissions, the count Y % at the n locations can be treated 
approximately as realisations of independent random variables with m ean A(s t ) = t ,ri(s,' 


where rj(si) measures the radioactivity at location Sj, i — 1,..., n. See 
and reference therein for further details. 


Diggle et al 


(119981 ) 


For these data, 


Dig gle et cd. 


(119981 ) propose the following geostatistical model 


Yi | An S , u(si) ~ Poisson(fjT^Sj)), 


( 10 ) 


logrj(si) = (3 0 + u(si) 

where, (f/(si),..., U(s n ) ~ N n ( 0, S), are spatial random effects, which are marginally nor¬ 
mally distributed with mean zero and covariance matrix E. Typically, estimation of a full £ 
is not possible, unless we place a proper prior on it, and some structure has to be imposed 
on it. Here we assume the exponential model, which implies that 
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= co v(U(si),U(sj) = o 2 exp{ —| |sj - sj || 2 /ct}, 

where a 2 is a variance parameter, a controls the correlation function. In our computations 
the distance matrix of the locations is divided by 100 in order to avoid numerical overflow 
problems in the computation of £. 

The aim is to fit model (HU to the Rongelap data by maximum likelihood estimation, 
where the parameter of interest is (/3o,a 2 ,a). The marginal likelihood of (/3o,a 2 ,a) can be 
recast in the form of (2), with f u (-]9 u ) given by the multivariate normal distribution with 
mean zero and the covariance matrix controlled by 9 U = (a 2 , a) and with L(9',u,y) the 
likelihood given by the conditional distribution of y given u and 9 = (3q. 

To approximate the marginal likelihood we use the Laplace method and the improved 


Laplace approximation with approximate conditional minima. For co m paris on 


approximate (2) also by importance sampling as proposed by 


Sung & Geverl (120071 ). Corn- 


purposes, we 


parison with INLA in this case is not possible as the R-INLA package provides only numerical 
approximations to the marginal posterior distributions and not to the full marginal likelihood 
(2). We use the multivariate Student’s ^-distribution as importance density. The location 
of the importance density is fixed at the mode of the conditional density of u given y and 
(/So, o' 2 , a) and the scale matrix of the importance density is fixed at the Hessian matrix 
of the negative logarithm of the conditional density of u given y and (/3 0 , a 2 , a). The IS 
approximation of (2) for a fixed (/3o,er 2 ,a) is 


o 2 , a; y) = m 1 ^ 

3 = 1 


L{Po\u{s)j, y)f u {s) (tx(g)j-;I!) 

R u ( s )j) 


32 






Approximate MLE 


Methods 

Laplace 

Improved 

Laplace 

Importance 

Sampling (3x SE M c) 

A) 

1.83 

1.83 

1.83 (3.6xl0 -6 ) 

a 2 

0.224 

0.302 

0.296 (2.2xl0 -4 ) 

a 

0.081 

0.104 

0.103 (4.5xl0 -5 ) 


Table 4: Rongelap data. Improved Laplace approximation (with approximate conditional 
minima) the standard Laplace approximation and importance sampling estimates. For the 
IS method, the point estimates are obtained by averaging over 50 replications obtained with 
50 different random seeds; the quantity in parenthesis gives 3xSE M c, where SE M c is the 
Monte Carlo standard error given by the standard deviation divided by \/50. 

where u(s)j is the jth random vector drawn from the importance distribution /(•), for j = 
1 and m is the overall number of random draws. To have an importance density 

with heavy tails we fix the degrees of freedom to 5. Furthermore, we fix the random seed 
in order to obtain a smooth approximation for the likelihood function. An issue with the 
Monte Carlo approximation is that the resulting estimate is subject to stochastic variability. 
To take this into account we consider m = 2 x 10 4 draws and compute the approximate MLE 
at 50 different seeds. Increasing m would give more stable results but at the cost of higher 
computing time. The final estimates are obtained by averaging the 50 approximate MLEs 
and the Monte Carlo standard error is also computed from these replications. 

Results, shown in Table U highlight that the standard Laplace approximation tend to 
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Figure 6: Rongelap data. Approximate relative profile log-likelihoods for a 2 obtained with the 
standard Laplace approximation (black continued), with the improved Laplace approximation 
with approximate conditional minima (red dashed) and with IS (green dot-dashed). The 
horizontal dotted line is the level which gives the 0.95 confidence interval for a 2 based on the 
log-profile likelihood ratio statistic. 

underestimate both a 2 and a as compared to the IS approximation, here treated as more 
trustworthy. On the other hand, the improved Laplace approximation gives similar results 
to the IS approximation. Such a behaviour is perhaps more clear-cut if we look at the 
relative profile log-likelihood function of a 2 , depicted in Figure [6] (the plots for a are similar 
and are omitted). This figure highlights that the standard Laplace approximation may 
produce misleading frequentist inference on the variance parameters, in terms of both, point 
estimation and profile likelihood-based interval estimation. 
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4 Discussion 


Although largely improving over the standard Laplace approximation, the proposed method 
is guaranteed to work only if the integrand is unimodal, with the mode being inside the 
domain of integration. This is because, if h(-) has either multiple minima or the minimum is 
not inside the domain of integration, then the determinants of blocks of its Hessian matrix 
may not be positive definite and the computation of dHJ) and (J7|) may break down. A possible 
way to deal with multimodal integrands is through a mixture of Laplace approximations, e.g. 
one Laplace approximation for each of mode, provided they can all be found. However, these 
issues are open problems and are left for future work. 

A convenient feature of the proposed method is that the d integrals can be easily computed 
in parallel. The numerical re-normalisations are an additional and difficult-to-quantify source 
of error. However, scalar numerical integration via carefully chosen adaptive quadratures is 
in general extremely accurate. 

The main computational burden of the method is due to conditional minimisations and 
Hessian determinants. Both can be greatly simplified by considering analytical first and 
second-order derivatives of h(-). This is the strategy adopted in the iLaplace package 
and throughout the examples. An alternative to analytical differentiation is the automatic 


differentiation, whip 


Fournier et al. 


1 provides on-line function differentiation during its evaluation (see, e.g., 


2 012 ). However, since automatic differentiation requires further programming 


efforts, we have not tried it in our package, though we plan to explore this possibility in future 


versions. 


The version of the method with approximate conditional minima showed good perfor- 
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mance and significant savings in terms of computing time in the examples considered. An¬ 
other alternative to this could be to use approximate conditional minima as starting points 
for the computation of the exact ones. Although this would speed-up the computation of 
conditional minima, the method might not be as fast as when using approximate conditional 
minima in place of the actual one. 

From a practical perspective, the improved Laplace approximation requires the integrand 
to be concave and unimodal but not necessarily symmetric or with Gaussian tails, though fur¬ 
ther assumptions are required to guarantee its asymptotic properties. In our experience, the 
standard Laplace approximation tends to work poorly when many variables of the integrand 
lay on the positive subset of the real numbers or when the dimensionality of the integrand 
increases with the sample size. Indeed, despite applying logarithmic transformations, such 
variables may still lead to asymmetric or heavy-tailed integrands. While in Bayesian applica¬ 
tions H(-) may not always be unimodal, in GLMM it is often unimodal. In many instances, 
with independent random effects, the standard Laplace approximation or numerical integra¬ 
tion with a few quadrature points are accurate enough for practical purposes. Indeed standard 
GLMM can now be fitted quite accurately by available R packages such as lme4. However, in 
models with complicated, dependent and/or crossed random effects, Laplace’s method may 
perform poorly, and numerical integration may require a large number of quadrature points, 
hence leading to a higher computational overhead. Our method seems particularly suited for 
these contexts, as was also demonstrated by the examples of Sections 13.51 and 13.61 

Finally, the improved Laplace approximation can be used to compute slightly modi fied K L 


divergences that arise in the variational approximation framework (see, e.g., 


Ormerod & Wand, 


36 





2010). In this context, the method can be useful for extending the usual Gaussian varia- 


tional approach to the use o f more flexible and non-conjugate densities, such as the skew-t 


flAzzalini fe Capitaniol . 


20031). This and the extension of the method to cases in which the 


mode lies outside the integration region are under investigation. 
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